Symmetric Eigenvalue Decomposition - Jacobi Method and High Relative Accuracy

The Jacobi method is the oldest method for EVD computations, dating back from 1864. The method does not require tridiagonalization. Instead, the method computes a sequence of orthogonally similar matrices which converge to a diagonal matrix of eigenvalues. In each step a simple plane rotation which sets one off-diagonal element to zero is performed.

For positive definite matrices, the method computes eigenvalues with high relative accuracy.

For more details, see I. Slapničar, Symmetric Matrix Eigenvalue Techniques and Z. Drmač, Computing Eigenvalues and Singular Values to High Relative Accuracy and the references therein.

Prerequisites

The reader should be familiar with concepts of eigenvalues and eigenvectors, related perturbation theory, and algorithms.

Competences

The reader should be able to recognise matrices which warrant high relative accuracy and to apply Jacobi method to them.

Jacobi method

$A$ is a real symmetric matrix of order $n$ and $A= U \Lambda U^T$ is its EVD.

Definitions

The Jacobi method forms a sequence of matrices,

$$ A_0=A, \qquad A_{k+1}=G(c,s,i_k,j_k) A_k G(c,s,i_k,j_k)^T, \qquad k=1,2,\ldots, $$

where $G(c,s,i_k,j_k)$ is the orthogonal plane rotation matrix. The parameters $c$ and $s$ are chosen such that

$$ [A_{k+1}]_{i_k j_k}=[A_{k+1}]_{j_k i_k}=0. $$

The plane rotation is also called Jacobi rotation.

The off-norm of $A$ is

$$ \| A\|_{\mathrm{off}}=\big(\sum_{i}\sum_{j\neq i} a_{ij}^2\big)^{1/2}, $$

that is, off-norm is the Frobenius norm of the matrix consisting of all off-diagonal elements of $A$.

The choice of pivot elements $[A_k]_{i_kj_k}$ is called the pivoting strategy.

The optimal pivoting strategy, originally used by Jacobi, chooses pivoting elements such that

$$ |[A_k]_{i_k j_k}|=\max_{i<j} |[A_k]_{ij}|. $$

The row-cyclic pivoting strategy chooses pivot elements in the systematic row-wise order,

$$ (1,2), (1,3), \ldots,(1,n),(2,3), (2,4),\ldots,(2,n),(3,4),\ldots,(n-1,n). $$

Similarly, the column-cyclic strategy chooses pivot elements column-wise.

One pass through all matrix elements is called cycle or sweep.

Facts

  1. The Jacobi rotations parameters $c$ and $s$ are computed as follows: If $[A_k]_{i_kj_k}=0$, then $c=1$ and $s=0$, otherwise \begin{align*} & \tau=\frac{[A_k]_{i_ki_k}-[A_k]_{j_kj_k} }{2[A_k]_{i_kj_k} },\qquad t=\frac{\mathop{\mathrm{sign}}(\tau)}{|\tau|+\sqrt{1+\tau^2}},\\ & c=\frac{1}{\sqrt{1+t^2}},\qquad s=c\cdot t. \end{align*}

  2. After each rotation, the off-norm decreases, $$ \|A_{k+1}\|_{\mathrm{off}}^2=\|A_{k}\|_{\mathrm{off}}^2-2[A_k]_{i_kj_k}^2. $$ With the appropriate pivoting strategy, the method converges in the sense that $$ \|A_{k}\|_{\mathrm{off}}\to 0,\qquad A_k\to\Lambda, \qquad \prod_{k=1}^{\infty} G(i_k,j_k,c,s)^T \to U. $$

  3. For the optimal pivoting strategy the square of the pivot element is greater than the average squared element, $$ [A_k]_{i_kj_k}^2\geq \frac{1}{n(n-1)}\, \|A_k\|_{\mathrm{off}}^2 . $$ Thus, $$ \|A_{k+1}\|_{\mathrm{off}}^2\leq\left(1-\frac{2}{n(n-1)}\right)\|A_{k}\|_{\mathrm{off}}^2 $$ and the method converges.

  4. For the row cyclic and the column cyclic pivoting strategies, the method converges. The convergence is ultimately quadratic in the sense that $$ \|A_{k+n(n-1)/2}\|_{\mathrm{off}} \leq\ const\cdot \|A_{k}\|_{\mathrm{off}}^2, $$ provided $\|A_{k}\|_{\mathrm{off}}$ is sufficiently small.

  5. The EVD computed by the Jacobi method satisfies the standard error bounds.

  6. The Jacobi method is suitable for parallel computation. There exist convergent parallel strategies which enable simultaneous execution of several rotations.

  7. The Jacobi method is simple, but it is slower than the methods based on tridiagonalization. It is conjectured that standard implementations require $O(n^3\log n)$ operations. More precisely, each cycle clearly requires $O(n^3)$ operations and it is conjectured that $\log n$ cycles are needed until convergence.

  8. If $A$ is positive definite, the method can be modified such that it reaches the speed of the methods based on tridiagonalization and at the same time computes the EVD with high relative accuracy.

Examples

$\begin{bmatrix} c & s\\-s& c\end{bmatrix}^T \begin{bmatrix} a & b\\ b & d\end{bmatrix} \begin{bmatrix} c & s\\-s& c\end{bmatrix} = \begin{bmatrix} \tilde a & 0 \\ 0 &\tilde b\end{bmatrix} $


In [31]:
using LinearAlgebra
function myJacobi(A::Array{T}) where T<:Real
    n=size(A,1)
    U=Matrix{T}(I,n,n)
    # Tolerance for rotation
    tol=sqrt(map(T,n))*eps(T)
    # Counters
    p=n*(n-1)/2
    sweep=0
    pcurrent=0
    # First criterion is for standard accuracy, second one is for relative accuracy
    while sweep<10 && norm(A-Diagonal(diag(A)))>tol
    # while sweep<30 && pcurrent<p
        sweep+=1
        # Row-cyclic strategy
        for i = 1 : n-1 
            for j = i+1 : n
                # Check for the tolerance - the first criterion is standard,
                # the second one is for relative accuracy for PD matrices               
                # if A[i,j]!=zero(T)
                if abs(A[i,j])>tol*sqrt(abs(A[i,i]*A[j,j]))
                    # Compute c and s
                    τ=(A[i,i]-A[j,j])/(2*A[i,j])
                    t=sign(τ)/(abs(τ)+sqrt(1+τ^2))
                    c=one(T)/sqrt(one(T)+t^2)
                    s=c*t
                    G=LinearAlgebra.Givens(i,j,c,s)
                    A=G*A
                    A*=G'
                    A[i,j]=zero(T)
                    A[j,i]=zero(T)
                    U*=G'
                    pcurrent=0
                    # To observe convergence
                    # display(A)
                else
                    pcurrent+=1
                end
            end
        end
        # display(A)
    end
    diag(A), U
end


Out[31]:
myJacobi (generic function with 1 method)

In [10]:
methodswith(LinearAlgebra.Givens);

In [11]:
import Random
Random.seed!(516)
n=4
A=Matrix(Symmetric(rand(n,n)))


Out[11]:
4×4 Array{Float64,2}:
 0.521525   0.890723  0.659431  0.0888795
 0.890723   0.701459  0.573688  0.52658
 0.659431   0.573688  0.97184   0.198953
 0.0888795  0.52658   0.198953  0.382115

In [20]:
λ,U=myJacobi(A)


4×4 Array{Float64,2}:
 -0.408254    0.109346   -0.0228404  -0.0175519
  0.109346    2.26498    -0.0255137   0.0218131
 -0.0228404  -0.0255137   0.228267    0.0
 -0.0175519   0.0218131   0.0         0.491942
4×4 Array{Float64,2}:
 -0.413833     -0.000467744  6.63545e-6  5.56983e-6
 -0.000467744   2.27004      1.06468e-5  3.62874e-8
  6.63545e-6    1.06468e-5   0.228661    0.0
  5.56983e-6    3.62874e-8   0.0         0.492069
4×4 Array{Float64,2}:
 -0.413833     -1.10193e-10  2.20873e-16  2.1888e-18
 -1.10193e-10   2.27004      1.13927e-18  2.48068e-28
  2.20873e-16   1.13927e-18  0.228661     0.0
  2.1888e-18    2.48068e-28  0.0          0.492069
4×4 Array{Float64,2}:
 -0.413833     -3.91653e-34  0.0          2.1888e-18
 -3.91653e-34   2.27004      1.13927e-18  1.58202e-28
  0.0           1.13927e-18  0.228661     7.52456e-34
  2.1888e-18    1.58202e-28  7.52456e-34  0.492069
Out[20]:
([-0.4138327966810198, 2.270041661404073, 0.22866142171277953, 0.49206900298499145], [0.6499796467782077 0.5246016723266144 -0.5411414516612844 -0.09739339536187192; -0.6504278236233745 0.5874545253279156 -0.2819898931928687 0.39028518683827634; -0.09479843875402832 0.5640827123872436 0.5435404743716127 -0.61431889301529; 0.38142281137923 0.24799387018578273 0.5763778119869594 0.6788256750882119])

In [21]:
# Orthogonality
U'*U


Out[21]:
4×4 Array{Float64,2}:
  1.0          -3.69916e-17  -3.08488e-16  -7.98352e-18
 -3.69916e-17   1.0          -1.33021e-17   7.96275e-18
 -3.08488e-16  -1.33021e-17   1.0          -1.37318e-16
 -7.98352e-18   7.96275e-18  -1.37318e-16   1.0

In [22]:
# Residual
A*U-U*Diagonal(λ)


Out[22]:
4×4 Array{Float64,2}:
  5.55112e-17  6.66134e-16   5.55112e-17  -1.11022e-16
 -5.55112e-17  6.66134e-16  -1.38778e-16   1.38778e-16
  4.85723e-17  6.66134e-16  -4.16334e-17  -2.22045e-16
 -2.77556e-17  2.22045e-16   5.55112e-17   2.22045e-16

In [23]:
# Positive definite matrix
n=100
A=rand(n,n)
A=Matrix(Symmetric(A'*A));

In [33]:
@time λ,U=myJacobi(A)
norm(U'*U-I),norm(A*U-U*Diagonal(λ))


  3.517880 seconds (233.61 k allocations: 8.710 GiB, 18.22% gc time)
Out[33]:
(1.8382382339947024e-13, 3.472896953951222e-11)

In [25]:
λ


Out[25]:
100-element Array{Float64,1}:
    4.412809798401899
 2537.6642528413263
   32.11279820896945
   29.994614748386997
    0.015325201797605886
    3.0662276640018677
   15.74916649097252
    0.0039025137619321756
   29.69183936656711
   25.85674533572733
    1.91214693026913e-6
    0.11483951072551898
    0.028374888575368136
    ⋮
    8.213546011760384
    8.394069232411645
    3.499270813206219
    3.3687495340126166
    7.185914315021575
    4.999679952206746
    4.972546874965672
    5.084460247190997
    6.505982621773337
    5.378942108974826
    6.269460182741506
    5.981493344044229

In [26]:
cond(A)


Out[26]:
1.3271282697880414e9

In [27]:
# Now the standard QR method
λₛ,Uₛ=eigen(A);

In [29]:
norm(Uₛ'*Uₛ-I),norm(A*Uₛ-Uₛ*Diagonal(λₛ))


Out[29]:
(9.343867008498251e-14, 1.775956270508059e-12)

myJacobi() is accurate but very slow. Notice the extremely high memory allocation.

The two key elements to reducing the allocations are:

  1. make sure variables don't change type within a function, and
  2. reuse arrays in hot loops.

Here we will simply use the in-place multiplication routines which are in Julia denoted by !.


In [34]:
@time eigen(A);


  0.005880 seconds (16 allocations: 272.156 KiB)

In [35]:
@time myJacobi(A);


  3.500785 seconds (233.61 k allocations: 8.710 GiB, 18.06% gc time)

In [41]:
function myJacobi(A1::Array{T}) where T<:Real
    A=deepcopy(A1)
    n=size(A,1)
    U=Matrix{T}(I,n,n)
    # Tolerance for rotation
    tol=sqrt(map(T,n))*eps(T)
    # Counters
    p=n*(n-1)/2
    sweep=0
    pcurrent=0
    # First criterion is for standard accuracy, second one is for relative accuracy
    # while sweep<30 && norm(A-Diagonal(diag(A)))>tol
    while sweep<30 && pcurrent<p
        sweep+=1
        # Row-cyclic strategy
        for i = 1 : n-1 
            for j = i+1 : n
                # Check for the tolerance - the first criterion is standard,
                # the second one is for relative accuracy for PD matrices               
                # if A[i,j]!=zero(T)
                if abs(A[i,j])>tol*sqrt(abs(A[i,i]*A[j,j]))
                    # Compute c and s
                    τ=(A[i,i]-A[j,j])/(2*A[i,j])
                    t=sign(τ)/(abs(τ)+sqrt(1+τ^2))
                    c=1/sqrt(1+t^2)
                    s=c*t
                    G=LinearAlgebra.Givens(i,j,c,s)
                    # A=G*A
                    lmul!(G,A)
                    # A*=G'
                    rmul!(A,adjoint(G))
                    A[i,j]=zero(T)
                    A[j,i]=zero(T)
                    # U*=G'
                    rmul!(U,adjoint(G))
                    pcurrent=0
                else
                    pcurrent+=1
                end
            end
        end
    end
    diag(A), U
end


Out[41]:
myJacobi (generic function with 1 method)

In [43]:
@time λ,U=myJacobi(A);


  0.011782 seconds (10 allocations: 157.734 KiB)

In [44]:
norm(U'*U-I),norm(A*U-U*Diagonal(λ))


Out[44]:
(1.8382430885369045e-13, 3.4728905160977044e-11)

Relative perturbation theory

$A$ is a real symmetric PD matrix of order $n$ and $A=U\Lambda U^T$ is its EVD.

Definition

The scaled matrix of the matrix $A$ is the matrix $$ A_S=D^{-1} A D^{-1}, \quad D=\mathop{\mathrm{diag}}(\sqrt{A_{11}},\sqrt{A_{22}},\ldots,\sqrt{A_{nn}}). $$

Facts

  1. The above diagonal scaling is nearly optimal (van der Sluis): $$ \kappa_2(A_S)\leq n \min\limits_{D=\mathrm{diag}} \kappa(DAD) \leq n\kappa_2(A). $$

  2. Let $A$ and $\tilde A=A+\Delta A$ both be positive definite, and let their eigenvalues have the same ordering. Then $$ \frac{|\lambda_i-\tilde\lambda_i|}{\lambda_i}\leq \frac{\| D^{-1} (\Delta A) D^{-1}\|_2}{\lambda_{\min} (A_S)}\equiv \|A_S^{-1}\|_2 \| \Delta A_S\|_2. $$ If $\lambda_i$ and $\tilde\lambda_i$ are simple, then $$ \|U_{:,i}-\tilde U_{:,i}\|_2 \leq \frac{\| A_S^{-1}\|_2 \|\Delta A_S\|_2} {\displaystyle\min_{j\neq i}\frac{|\lambda_i-\lambda_j|}{\sqrt{\lambda_i\lambda_j}}}. $$ These bounds are much sharper than the standard bounds for matrices for which $\kappa_2(A_S)\ll \kappa_2(A)$.

  3. The Jacobi method with the relative stopping criterion $$ |A_{ij}|\leq tol \sqrt{A_{ii}A_{jj}}, \quad \forall i\neq j, $$ and some user defined tolerance $tol$ (usually $tol=n\varepsilon$), computes the EVD with small scaled backward error $$ \|\Delta A_S\|\leq \varepsilon\, O(\|A_S\|_2)\leq O(n)\varepsilon, $$ provided that $\kappa_2([A_k]_S)$ does not grow much during the iterations. There is overwhelming numerical evidence that the scaled condition does not grow much, and the growth can be monitored, as well.

The proofs of the above facts are in J. Demmel and K. Veselić, Jacobi's method is more accurate than QR.

Example - Scaled matrix


In [45]:
D=Diagonal([1,2,3,4,1000])


Out[45]:
5×5 Diagonal{Int64,Array{Int64,1}}:
 1  ⋅  ⋅  ⋅     ⋅
 ⋅  2  ⋅  ⋅     ⋅
 ⋅  ⋅  3  ⋅     ⋅
 ⋅  ⋅  ⋅  4     ⋅
 ⋅  ⋅  ⋅  ⋅  1000

In [48]:
Random.seed!(431)
n=6
A=rand(n,n)
A=Matrix(Symmetric(A'*A));
Aₛ=[A[i,j]/sqrt(A[i,i]*A[j,j]) for i=1:n, j=1:n]


Out[48]:
6×6 Array{Float64,2}:
 1.0       0.826198  0.599379  0.57707   0.95549   0.904854
 0.826198  1.0       0.875302  0.835009  0.939256  0.749163
 0.599379  0.875302  1.0       0.979583  0.728843  0.550308
 0.57707   0.835009  0.979583  1.0       0.707762  0.593786
 0.95549   0.939256  0.728843  0.707762  1.0       0.878301
 0.904854  0.749163  0.550308  0.593786  0.878301  1.0

In [49]:
A


Out[49]:
6×6 Array{Float64,2}:
 2.55752  2.00174  1.28826  1.27313  1.69833  1.62083
 2.00174  2.29524  1.78224  1.74518  1.58156  1.27128
 1.28826  1.78224  1.80629  1.81622  1.08871  0.82842
 1.27313  1.74518  1.81622  1.90313  1.0852   0.91752
 1.69833  1.58156  1.08871  1.0852   1.2353   1.0934
 1.62083  1.27128  0.82842  0.91752  1.0934   1.25459

In [50]:
cond(Aₛ), cond(A)


Out[50]:
(2150.5714115462656, 2282.351517374101)

In [51]:
# We add a strong scaling
D=exp.(50*(rand(n).-0.5))


Out[51]:
6-element Array{Float64,1}:
      2.2271337290182809e-10
      1.2274296893634574e9
     18.317196864288768
 186314.1160431326
      0.0006212720403915135
      1.5274198482771623e-10

In [52]:
H=Diagonal(D)*Aₛ*Diagonal(D)


Out[52]:
6×6 Array{Float64,2}:
 4.96012e-20       0.225854      2.44516e-9  …       1.32207e-13  3.0781e-20
 0.225854          1.50658e18    1.96795e10     716246.0          0.140453
 2.44516e-9        1.96795e10  335.52                0.0082942    1.53965e-9
 2.39453e-5        1.90956e14    3.34307e6          81.9247       1.6898e-5
 1.32207e-13  716246.0           0.0082942           3.85979e-7   8.33457e-14
 3.0781e-20        0.140453      1.53965e-9  …       8.33457e-14  2.33301e-20

In [53]:
# Now we scale again
Hₛ=[H[i,j]/sqrt(H[i,i]*H[j,j]) for i=1:n, j=1:n]


Out[53]:
6×6 Array{Float64,2}:
 1.0       0.826198  0.599379  0.57707   0.95549   0.904854
 0.826198  1.0       0.875302  0.835009  0.939256  0.749163
 0.599379  0.875302  1.0       0.979583  0.728843  0.550308
 0.57707   0.835009  0.979583  1.0       0.707762  0.593786
 0.95549   0.939256  0.728843  0.707762  1.0       0.878301
 0.904854  0.749163  0.550308  0.593786  0.878301  1.0

In [54]:
cond(Hₛ),cond(H)


Out[54]:
(2150.571411546287, 8.242988204016304e38)

In [55]:
# Jacobi method
λ,U=myJacobi(H)


Out[55]:
([6.639338347637239e-22, 1.506583666534153e18, 9.917294095924937, 1.0509671246976448e10, 2.5910948129747017e-8, 3.602046700706043e-21], [0.8289591326629637 1.4991113506170175e-19 … 6.350474463617169e-7 0.5593091778026386; 1.3254344212068435e-19 0.9999999919674959 … -6.809263533260158e-13 1.4302362494444017e-19; … ; -3.5244148760375195e-7 4.754108710630448e-13 … 0.9999999993901578 -6.130560165133361e-7; -0.559309177802888 9.322625784252507e-20 … 3.11074624475631e-7 0.8289591326629803])

In [56]:
# Standard QR method
λ₁,U₁=eigen(H)


Out[56]:
Eigen{Float64,Float64,Array{Float64,2},Array{Float64,1}}
values:
6-element Array{Float64,1}:
 2.6829263829812304e-21
 4.960124646930873e-20
 2.5910948129583934e-8
 9.917294095827494
 1.0509671246976444e10
 1.5065836665341514e18
vectors:
6×6 Array{Float64,2}:
 -0.0290221    1.0   6.35047e-7    1.28032e-11   4.45413e-16  1.49911e-19
  4.44093e-20  0.0  -6.80926e-13   2.82638e-9    0.000126748  1.0
  1.15155e-11  0.0   3.49168e-5   -1.0          -8.07583e-5   1.30623e-8
 -1.10605e-15  0.0  -1.97699e-9    8.07583e-5   -1.0          0.000126748
 -3.10944e-7   0.0   1.0           3.49168e-5    8.42833e-10  4.75411e-13
  0.999579     0.0   3.11075e-7    2.23821e-11   8.60325e-17  9.32263e-20

In [64]:
# Compare
[sort(λ) sort(λ₁)]


Out[64]:
6×2 Array{Float64,2}:
 6.63934e-22  2.68293e-21
 3.60205e-21  4.96012e-20
 2.59109e-8   2.59109e-8
 9.91729      9.91729
 1.05097e10   1.05097e10
 1.50658e18   1.50658e18

In [70]:
λ[1]


Out[70]:
6.639338347637239e-22

In [67]:
sort(λ)-sort(λ₁)


Out[67]:
6-element Array{Float64,1}:
   -2.0189925482175063e-21
   -4.599919976860269e-20
    1.6308362084850472e-19
    9.744383078214014e-11
    3.814697265625e-6
 1536.0

In [66]:
(sort(λ)-sort(λ₁))./sort(λ)


Out[66]:
6-element Array{Float64,1}:
  -3.040954448323922
 -12.770295221210295
   6.294004373436141e-12
   9.825646979873298e-12
   3.6297018013027375e-16
   1.0195251907473007e-15

In [71]:
# Check with BigFloat
λ₂,U₂=myJacobi(map(BigFloat,H))
λ₂


Out[71]:
6-element Array{BigFloat,1}:
 6.639338347636978529927229993857181868728832091544013581877936779166890586722408e-22
 1.506583666534152515559851410423035778679052938516648886706051024540631281201094e+18
 9.917294095924888174393482177270077731209297684093460115049788286542808522740377
 1.05096712469764394858225098052559387058258270604718865433068736387455737534328e+10
 2.591094812974685868334656643489241678555151754233661403589069371029847883253728e-08
 3.602046700706017132058904286277989221215849193663410894313463527192708721064271e-21

In [69]:
# Relative error is eps()*cond(AS)
map(Float64,(sort(λ₂)-sort(λ))./sort(λ₂))


Out[69]:
6-element Array{Float64,1}:
 -3.926102380007449e-14
 -7.128907915321807e-15
 -6.127447935327731e-15
 -4.961642322799733e-15
 -8.157495434418286e-16
 -2.9499865056415837e-16

Indefinite matrices

Definition

Spectral absolute value of the matrix $A$ is the matrix

$$ |A|_{\mathrm{spr}}=(A^2)^{1/2}. $$

This is positive definite part of the polar decomposition of $A$.

Facts

  1. The above perturbation bounds for positive definite matrices essentially hold with $A_S$ replaced by $[|A|_{\mathrm{spr}}]_S$.

  2. Jacobi method can be modified to compute the EVD with small backward error $\| \Delta [|A|_{\mathrm{spr}}]_S\|_2$.

The details of the indefinite case are beyond the scope of this course, and the reader should consider references.


In [ ]: